-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instrumentation for AWS Lambda Service - Implementation (Part 2/2) #777
Add instrumentation for AWS Lambda Service - Implementation (Part 2/2) #777
Conversation
16ac59f
to
ce122d7
Compare
ce122d7
to
c79c369
Compare
c079d96
to
06c3b96
Compare
with tracer.start_as_current_span( | ||
name=orig_handler_name, | ||
context=parent_context, | ||
kind=SpanKind.SERVER, | ||
) as span: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #739 (comment) @owais said:
What if the lambda was triggered by SQS/SNS/Kinesis/S3/etc? It would be a consumer span in that case.
The spec does say the following:
For all events, a span with kind SERVER MUST be created corresponding to the function invocation unless stated otherwise below.
It didn't seem too controversial for the original PR that added it to the specs, comments like open-telemetry/opentelemetry-specification#1442 (comment) were just making sure downstream span were labeled as CONSUMER
.
I think the intention is that Lambda spans should always be treated as a "Server" which handles requests even if it is triggered from an SQS publish. The concept of linking traces between SQS and Lambda is a very difficult problem we have been trying to solve at AWS as noted in aws/aws-xray-sdk-go#218 and several other linked issues.
That's why I believe the spec has said a blanket statement that Lambda spans should always be SERVER spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be I need to read it in more detail but it says "unless stated otherwise below." and then in SQS section, it says,
The span kind for both types of SQS spans MUST be CONSUMER.
I'll review this and the spec again in detail tomorrow. If you get other approvals by then please go ahead with the merge as we can update this later if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right it does say that...
Hm interesting, reading that sentence you pointed out I guess you could connect the main paragraph:
... SQS ... triggers a Lambda function. ... We consider processing both of a batch (1) and of each individual message (2). The function invocation span MUST correspond to the SQS event, which is the batch of messages. ** For each message, an additional span SHOULD be created to correspond with the handling of the SQS message. **
to this sentence:
The span kind for both types of SQS spans MUST be CONSUMER.
and understand the following:
The spec says the instrumentation has to have the following spans:
- The overall Lambda span
- The SQS span
- Batch messages
- Indidivdual messages
So I guess the question is does both types of SQS spans
refer to the Lambda function span triggered for batch/individual SQS messages or does it refer to the SQS span created in the user code to handle the batch/individual SQS spans.
The "For all events" in "For all events, a span with kind SERVER MUST be created corresponding to the function invocation unless stated otherwise below." is what makes me think it is the latter. But yeah like you said there is the unless stated otherwise below.
But yes the goal of this PR is just to bring over the code from opentelemetry-lambda
so hopefully we can make improvements as we go and while it's still in a beta stage 🙂
Please let me know if this makes sense to you! Also if @anuraaga has any input that would be super helpful! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I understood from it was that ONLY two spans should be created SQS batches and both should be of type CONSUMER. I think in case of SQS there shouldn't be any SERVER span especially if the Lambda is triggered periodically or by SQS itself. Overall it feels weird to have a PRODUCER <> SERVER
span pair and then have a CONSUMER
span as a child of the server span. I think a lot of APM backends will find that strange. Let's wait for @anuraaga to chime in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if the spec wording isn't too clear - indeed the intent is when handling SQS, for there to only be 1-2 (one of the spans generally needs to be created by the user) spans of type CONSUMER and no spans of type SERVER. There are some examples at the bottom which may also clear things up, though I can't say the ascii art I wrote renders very well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @anuraaga that makes a lot of sense! I couldn't find any similar implementations in the other languages on the opentelemetry-lambda repo so I went ahead with this idea for determining if the Lambda was triggered from SQS:
Based on the triggering Lambda from SQS docs, the Lambda event will have the key "Records"
if triggered from SQS. So in 8bf9147 I could check for it like this:
# See more:
# https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
if (
lambda_event
and isinstance(lambda_event, dict)
and "Records" in lambda_event
):
span_kind = SpanKind.CONSUMER
else:
span_kind = SpanKind.SERVER
I think we could also merge without this, and the suggestion is new, but shouldn't be a harm to try it out 🙂
@owais Please let me know if you think this is a good solution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checking for looking for events['Records"][0]["event_source"] in ["aws:sqs"]
is probably good enough but let's open an issue for this and solve it later in order to not hold this PR any longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks for that, I think that's a great suggestion! I changed it to this:
# See more:
# https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
try:
if lambda_event["Records"][0]["eventSource"] == "aws:sqs":
span_kind = SpanKind.CONSUMER
except Exception: # pylint: disable=broad-except
span_kind = SpanKind.SERVER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java is lucky to be able to just check instanceof on the actual types of arguments. This sort of attribute matching approach would be needed for nodejs though, where we haven't implemented the SQS handling yet, so there is indeed precedence for starting with only HTTP support and following up with SQS ;) But looks like you have a reasonable solution here.
|
||
tracer = get_tracer(__name__, __version__, tracer_provider) | ||
|
||
with tracer.start_as_current_span( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #739 (review) @owais said:
Q: How would this work if the lambda uses a framework like flask or falcon? Wouldn't that result in two SERVER spans?
I think this is related to #777 (comment) below.
I think it will return 2 SERVER
spans but that's what we expect? AWS Lambda is a short lived easy SERVER right? So it would have as much sense as running a flask app within another flask app? 😅 Please correct me if I'm wrong but that's my understanding, in which case 2 SERVER
span would be expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would be the ideal outcome but this is not a problem specific to lambda as we have the same issue with uwsig/gunicorn + flask/django/etc. We can solve it separately.
a182761
to
8bf9147
Compare
# See more: | ||
# https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html | ||
try: | ||
if lambda_event["Records"][0]["eventSource"] == "aws:sqs": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we code this defensively or catch only the expected exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm sure we can do that! Although it's such simple code I can't think of any other exception that should occur?
Either way I updated it to look like this based on my tests of what could go wrong 🙂 Please let me know what you think!
# See more:
# https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
try:
if lambda_event["Records"][0]["eventSource"] == "aws:sqs":
span_kind = SpanKind.CONSUMER
except (IndexError, KeyError, TypeError):
span_kind = SpanKind.SERVER
c221f49
to
5bf66d2
Compare
5bf66d2
to
f84c263
Compare
Description
Follow up to #739. In this PR, we add the Python code to actually implement instrumentation.
This PR adds Instrumentation for Lambda functions on the AWS Lambda service. Specifically, when you use AWS Lambda to call your
def lambda_handler(lambda_event, lambda_context)
on AWS, you can use this instrumentation package and its scripts to instrument that invocation with OpenTelemetry!Please read the description in #739 to learn more.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
The unit tests cover the manual instrumentation scenarios of
Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.